Chapter 13: Exercises

Part 1

Convert a cell array to a structure array: 假設一異質陣列 A 的內容如下：

張惠妹聽海 1998
周華健花心 1992
王　傑一場遊戲一場夢 1988
孫燕姿遇見 2003
孫燕姿超快感 2000
F.I.R. Lydia 2004
蔡依林愛情36計 2004
王心凌明天見 2004
請寫一段程式碼 cell2structure01.m，可以將此異質陣列 A 轉成結構陣列 song，其中：依此類推。
此題用到上題的結構陣列 song。請寫一段程式碼 structSort01.m，完成下列兩項工作：

請將結構陣列 song 的每筆資料依歌星名字內碼來排序，並將結果列印在螢幕。
請將結構陣列 song 的每筆資料依年代來排序，並將結果列印在螢幕。

此題用到上題的結構陣列 song。請寫一段程式碼 structCat01.m，完成下列兩項工作：

請取出所有歌星的名字，存成一個字串異質陣列，並將結果列印在螢幕。
請取出所有的年代，存成一個向量，並將結果列印在螢幕。

Part 2
Create a structure array from a data file: The contents of studentList.txt are shown next:
原始檔（13-結構陣列/studentList.txt）：（灰色區域按兩下即可拷貝）
Thomas Wang	72 71 75 81 80 86
Brian Chung	99 95 98 96 99 99
Alexander S.-H. Huang	90 85 87 90 88 92
Simon Lin	65 63 70 63 62 61
Note that in the table, there are two fields separated by the tab. The first field is "name" and the second field is "score". Please write a MATLAB script to read the file into a structure array "student" such that

student(i).name: The name of the i-th student
student(i).score: The scores of the i-th student, where each score is placed into a column vector for the student

Hint
It is a common practice to use tab for separting fields in a text file.

Hint
You can use textread or textscan to read the whole file into an cell array of strings first, and then use split to split each line into two fields.
Plot field data in a structure array: Given a structure array student obtained in the previous exercise, write a MATLAB script to plot the score curve of all the students. The figure should be like this:
If you do not have the result from the previous exercise, please run the following script to obtain the student array first:
13-結構陣列/studentArrayCreate.m

Hint
You should try to use vectorized operations as much as you can.

Concatenate fields in a structure array: Suppose a structure variable S is defined by the following statements: In particular, the field "score" stores the quiz scores for each of the student. Write a one-line MATLAB statement for each of the following tasks:

Compute the quiz average for each student.
Compute the average score of all students for each quiz.

Function for merging two structure variables: Write a function myStructMerge.m to merge two structure variables of potentially different field names. The returned variable is a structure variable of field names equal to the union of the field names of the input structure variables. Example usage is as follows: The returned structure variable c is (Hint: You may want to use the function fieldnames().)
Function for concatenating two structure arrays: Write a function myStructConcat.m to concatenate two structure arrays of potentially different field names. The returned variable is a column vector of structures which have the field names from both the input structure arrays. (Note that the input structure arrays must be converted into column vectors in order to concatenate them.) Example usage is as follows: The returned column vector c have 3 elements: (Hint: You may want to use the functions fieldnames() and setdiff().)
Sort a structure array: A structure may be sorted by using different fields. Write a function structArraySort(structArray, fields) to sort a structure array based on the given fields. Note that "fields" is an cell array of field names, which is to be used as the keys for sorting. After sorting, Let fields = {$f_1$, $f_2$, ..., $f_n$}. A should be sorted based on the first element of fields. If field value is the same for field 1, then their corresponding values of field 2 should be sorted, and so on.
Hint: To achieve this goal, your program should sort the structure array based on $f_n$, $f_{n-1}$, ..., $f_2$, and $f_1$.
Function for ranking bigram counts: Write a function bigramCountFun(str) to return a structure variable that contains the most frequent 3 bigrams of the given string, together with their counts. For instance, bigramCountFun(''吃葡萄不吐葡萄皮，不吃葡萄倒吐葡萄皮'') returns a bigram list with the following values:

bigram(1).word='葡萄', bigram(1).count=4
bigram(2).word='吃葡', bigram(2).count=2
bigram(3).word='吐葡', bigram(3).count=2
Find statistics of songs: Given a song list stored as a strcuture array songList (example mat file), write a function mySongStats.m with the following I/O format:
[out1, out2]=mySongStats(songList);
where

out1 is a structure array which list the top-10 most productive artists (with field name "name") and their song counts (with field name "songCount"), sorted by "songCount" in a descending order). For artists who have the same number of songs, it should be listed based on the ascending dictionary order of the artist names.
out2 is a cell string array which list the artists which has both Chinese and Taiwanese songs (sorted by the artist names).
Hints:

For both output, you need to remove items with artist names of "不詳", "unknown", and "老歌".
If there is two artists playing duet for a song, they should be treated as a single independent artist. For instance, '江蕙、葉啟田' should be treated as an independent artist if there is such a situation in an entry.
The sorting command "sort" in MATLAB is stable.
Since your program needs to deal with Chinese, make sure it is stored in Big5 encoding. Our judge system will treat your file as in Big5 encoding for further processing. (If you are using MATLAB editor, everything should be ok. If you are using other editors, then you need to pay special attention to this.)
Read an English dictionary:

Write a MATLAB function to read the English dictionary english.wpa and return the result as a structure array "wpa", with 2 fields "word" and "pa" (phonetic alphabet, or PA for simplicity) . For instance:

wpa(72561).word='multimedia'
wpa(72561).pa{1}='m_ah_l_t_ay_m_iy_d_iy_ah'
wpa(72561).pa{2}='m_ah_l_t_iy_m_iy_d_iy_ah'
Note that each entry in the file is an English word with its pronunciation in phonetic alphabets. Tab is used to separate the word from the phonetic alphabets. If there are more than two pronunciations for a given word, they are separated by the pound sign (#).
Find the top 5 mostly occurred beginning letters in English words and their counts.
Find the top 5 mostly occurred PAs and their counts.
Hint:

Use "textread" to read the whole file into memory first to save time: [word, pa]=textread(wpaFile, '%s\t%s'); Keep in mind that this is a big file with more than 120,000 entries! (If "textread" is not available, try "textscan".)
Use "cat" to put field values into an array.
Use "split" (available in the Utility Toolbox) to split PA into a cell array. (Please use mode=0 to speed up the computation.)
Use "join" (available in the Utility Toolbox) to join a string cell vector into a large string.
Use "elementCount" (available in the Utility Toolbox) to count the elements in a vector.

Bigram probability: The probability of "given a word w_i, the next word is w_i+1" is denoted by P(w_i+1|w_i) = count(w_iw_i+1)/count(w_i). Therefore for a given 5-word sentence Sw₁w₂w₃w₄w₅E, the corresponding probability of this sentence can be approximated by the bigram probability: P(S) P(w₁|S) P(w₂|w₁) P(w₃|w₂) P(w₄|w₃) P(w₅|w₄) P(w_E|w₅) Based on the corpus of Tang's Poem, estimate the log probability of 「」 based on the bigram probability.
Trigram probability: The probability of "given a two-word sequence w_iw_i+1, the next word is w_i+2" is denoted by P(w_i+2|w_iw_i+1) = count(w_iw_i+1w_i+2)/count(w_iw_i+1). Therefore for a given 5-word sentence Sw₁w₂w₃w₄w₅E, the corresponding probability of this sentence can be approximated by the bigram probability: P(Sw₁w₂) P(w₃|w₁w₂) P(w₄|w₂w₃) P(w_E|w₄w₅) Based on the corpus of Tang's Poem, estimate the log probability of 「」 based on the bigram probability.
Shannon visualization method: We can use the bigram probability to generate Shannon visualization, as follows:

Generate a random initial word by the probability P(S w₁).
Choose a random bigram w₁w₂ according to its probability.
Keep selecting bigram unitl we choose E.
String the words together to get the whole sentence.
For instance: S I ===> I want ===> want to ===> to eat ===> eat Chinese ===> Chinese food ===> food E So the whole sentence is S I want to eat Chinese food E Based on the corpus of Tang's Poem, generate the sentence of length 9 (start with 「黃」) with the maximum probability based on

the bigram probability.
the trigram probability.
the 4-gram probability.
For simplicity, we shall adopt a greedy approach such that we only need to pick up the n-gram with the maximum probability at each iteration. (Hint: You should go over the last 3 exercises in chapter 10 before attempting this exercise.)
MATLAB程式設計：入門篇

張惠妹	聽海	1998
周華健	花心	1992
王　傑	一場遊戲一場夢	1988
孫燕姿	遇見	2003
孫燕姿	超快感	2000
F.I.R.	Lydia	2004
蔡依林	愛情36計	2004
王心凌	明天見	2004